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Abstract 

These notes contain, among others, a proof that the average run¬ 
ning time of an easy solution to the satisfiability problem for propo¬ 
sitional calculus is, under some reasonable assumptions, linear (with 
constant 2) in the size of the input. Moreover, some suggestions are 
made about criteria for tractability of complex algorithms. In par¬ 
ticular, it is argued that the distribution of probability on the whole 
input space of an algorithm constitutes an non-negligible factor in 
estimating whether the algorithm is tractable or not. 


1 Introduction 

It is not unusual to hear computer professionals or students questioning 
the practical value of asymptotic complexity measures. To be honest, there 
is a lot of evidence of occasional discrepancy between algorithms’ asymptotic 
and actual behaviors, for example in the area of sorting and multiplication. 
After all, it seems typical, that authors (like Aho, Hopcroft, and Ullman in 
[AHU74], end of paragraph 1.4) rather discourage the reader from drawing 
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too many conclusions from the fact, that a running time of an algorithm is 
or is not in a certain O- or D-class. 

The theory of algorithms’ complexity is logical and clear: methods used 
there do not seem to involve an unintentional error, since the authority of 
mathematics has given it its consent of the thing quod erat demonstrandum. 
So, if it is so good then why it is so bad? To investigate this paradox let 
us try to take a closer look at motivations of dealing with asymptotic rather 
than actual complexities. 

There are two of them: essential machine independence of algorithms to 
be evaluated, and a virtual lack of limits on their inputs’ size. The second 
implies an expectation that inputs may grow boundlessly, which is considered 
to be the reason for the most serious obstacle for successful termination 
of a run. And this is why the asymptotic complexity of an algorithm has 
been supposed to characterize its performance on some future inputs which 
(probably) may be brought to processing. 

Depending on how the program’s complexity is expressed in terms of its 
input’s size, the existing asymptotic complexity measures fall into two cat¬ 
egories: the worst-case and the average one. The first kind seems to reflect 
an implicit presumption of malicious gnome, who selects (possibly the most 
troublesome) inputs to the evaluated program. The second does not allow 
averaging over inputs of different sizes, which causes at least calculational 
problems. Both of them, apparently fully adequate for evaluation of a sev¬ 
eral program, may completely fail if applied to a comparison: program A 
may have overall better performance than program B } with except for a few 
isolated “worst” cases, when B is much quicker than A\ program C may 
have substantially better average efficiency than program D, however only 
for sufficiently large inputs which may be not likely in all practical cases. 
Experience shows, that the above scenarios are by no means artificial. 

One may suspect that dealing with future unforeseeable events may need 
some probability theory, and it is indeed the point of view which we advocate 
here. Because in estimating how much time will be spent on future computa¬ 
tions one should take into account which elements of the input space are more 
likely, and which are less. Moreover, since how one measures the input size 
does not seem to have much influence either the on actual nor the expected 
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running time of any particular program, we do not see a good reason why 
expressing averages exclusively as a function of the size of input should be 
recognized as a universally satisfactory practice. (On the contrary, we have 
found it rather inadequate in our trials of evaluating an average running time 
of the programs considered in this paper). Therefore we propose a modified 
notions of average running time and corresponding to it O-classes. 

In our approach we postpone abstraction from constant factors to some 
later phase of evaluation. In particular 0(« 2 ) and 0(100 x » 2 ) classes are 
not identical in this paper. If one does not need to deal with complexity on 
such a concrete level, introducing appropriate equivalence relation (e.g. one 
can impose 0(* 2 ) = 0(100 x » 2 )), will easily translate obtained results into 
a language of modulo constants complexity classes. 

In the sequel, we will use an NP-complete problem, namely: the sat¬ 
isfiability problem of propositional calculus, as one of illustrations for our 
proposal. Before doing this, we will start from some theoretical considera¬ 
tions. 

We refer the reader to any handbook on measure theory for details con¬ 
cerning measure and probability spaces. An extensive study of algorithms’ 
complexity, including definitions of O- and (^-classes, satisfiability problem, 
NP-completeness theory, NP-hard problems, and Cook’s theorem, may be 
found in [PS82], Some striking results about better than expected behavior 
of certain algorithms related to NP-hard problems may be found in [Wil84], 

Shannon’s counting argument in the context of complexity of Boolean func¬ 
tions appears in [Weg87]. 

2 Average running time, O-hierarchy, and tractabil- 
ity of algorithms 


Further on, we will use the von Neuman’s definition of numbers, i.e. 0 
is the empty set, and n + 1 — n U {n} (= {0, ...,n}). We denote the set of 
all numbers by u, and the set of all of them without 0 by ta + . Moreover, we 
apply • symbol to avoid A - expressions. Namely, /(•) means A x.f(x), or in 
other words, /. E.g. • i denotes the cubic function. 
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We will fix our attention on an algorithm P, with countable domain X 
(which we will call the input space), running time T : X —>■ ta + , distribution 
of probability /i : X —» (0,1), extended to a normed measure p : V(X) —> 
(0,1) by p(Y) = J2 xeY p(x) (usually, it is assumed that p(X) = 1). 


By the average running time T£ vg (we use superscript ji to remind the 
explicit role of the probability distribution here) we understand a function 
defined for each Y C X as follows: 


(i) 


TV‘ _ X)a,gy T( X ) X A X ) 

J-avgK 1 ) n(Y) 


It is easily seen that the above expression defines the expected value of 
T(x) under condition x G Y , i.e. with respect to conditional probability 
p(x)/p{Y). So its value tells us, how much time, on an average, the algo¬ 
rithm P will spend running on a random input x, provided it is known that 
x G Y. 


If one would like to relate a running time to the size of input, a function 
f : X —» oj, interpreted as an input size measure, comes handy. It partitions 
the input space onto at most countably many non-empty subspaces, which 
are abstraction classes with respect to the equivalence relation =f defined 
by: x =f y iff f(x) = f(y)- We will use X[ as an abbreviation for {x G X \ 
f(x) = n} for any n G uj. Under such conventions, a relative average running 
time T[^ of P is usually defined by 

(ii) 



1 

IfixT) 


if p(Xf) = 0 
otherwise, 


where T/ satisfies for all x G X (or at least for those with p{x) ^ o): 
T,x(f{x)) = T(x). One can see that for each n G u, such that p(Xf) ^ o, 
^xexl T x( n ) x P( x ) = E x6X / T(x) x hence 



Unlike in the classic case, where averaging of related running time is 
allowed only over abstraction classes X£, we admit the general case, i.e. we 
assume that the relative average running time may be relativized to one 
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partition of X, but averaged over another (one may think: orthogonal) one. 
However, instead of disentangling the dependency between an average value 
of T(x) and average size of x, which does not seem simple, we will define 
directly, what it means that size-related average running time of P is in 0(F)- 
class for some function F : u> —t oj + . So, let a : X —> uj be such a partition 
with corresponding abstraction classes X%. We say that G 0(F) iff for 

each n, such that p(X%) ^ 0, 

(iii) x K x ) < M^n)- 

This means that the expected value of the quotient — over the set X" 
does not exceed 1. 

One may check that in the usual case, where a and / coincide, T^j £ 
0(F) iff for each n, such that p(X.f) ^ 0, T£ vg (Xf) < F(n), that is to 
say, Tj[yg(n) < F(n). Thus our definition makes a proper generalization of 
O-hierarchy of relative average running times. 

It is not necessary that we understand / as a measure of input size. We 
may think of / as the running time of another program Q with input space 
X. In light of such an interpretation, TO g a e 0(F) means that F is an 
average upper bound of proportionality factor between the running time of 
P and the running time of Q, over each class X“. E.g. if c is a constant then 
e 0(c x •) means that in each X“, Q is on average at most c times 
quicker than P. If one insists on referring to the ordinary input’s length, it 
may be measured by the time which the simple rewriting program will spent 
on it. 

Let us remind the reader here that our intention is, at least at earlier 
stages of evaluation, not to abstract from the constant factor neglected in 
the classic definition of O-hierarchy. This is why the coefficient at F(f(x)) 
in (iii) is 1. Moreover, instead of dealing with asymptotic behavior, we 
purposely introduced measures for the expected behavior, which involves all 
possible inputs, so (iii) holds for all n' s, not only for those greater than some 

n 0 - 


Finally, we define the notion of algorithm’s tractability. We call P tractable 
over Y C X iff 
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( iv ) T avg( Y ) < 00, 

which means, that the expected length of the running time of P is finite, 
provided inputs are restricted to Y. 

ft follows from the above definition, that a linear algorithm (i.e. one 
with linear running time) may be not tractable at the same time, when an 
exponential one is tractable, however, for different probability distributions. 
To see this possibility, let X = uj, T(n ) = n, and S(n) = 2 n . If pin) is 
proportional to n -2 , and v(n) to 2~ 2n then T£ vg (u>) = c x f = 00 and 
Savg(u) = dX J2i£uj 2~* = 2d<oo. 

One may notice, that in the definition of tractability, no input size mea¬ 
sure is explicitly present. This is consistent with a simple observation that 
how long it takes to complete a run does not depend on how one measures the 
size of the corresponding input, ft should be noted, however, that this natu¬ 
ral from mathematical point of view definition may be somewhat impractical 
in certain cases. Clearly, if T£ (X) = 00 then you may expect the worst. 
But if not? Two statements T£ vg {X) < 45 sec., and T£ vg (X) < 30,000 yrs., 
both implying the tractability of the program in question, have quite differ¬ 
ent informational content. Because in our approach we did not abstract from 
constant factors while measuring program’s complexity, our method may be 
applied as well for evaluating the tractability in a stronger sense, where, say, 
T£ (X) < 100 hrs. is required, ft is quite clear, that asymptotic complexity 
measures do not support, in general, this kind of estimations. 

If one is interested in measuring how the actual running time is distributed 
around its mean, other concepts of probability theory, for instance, variance, 
or standard deviation, may be helpful. We will not discuss them in this 
paper. Let us remark, however, that since T(x) is a non-negative random 
variable, the probability that for x G Y, T(x ) > a (where a is a positive 
constant) does not exceed ^ x T£ vg (Y). So, the computations longer than, 
say, 100 x T£ vg (Y) will occur in Y with at most 1% frequency. 

For the sake of completeness of the picture we draw, let us state some 
basic properties relating the introduced notions to each other. 
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Property 2.1 If T£ (X) < oo then for every countable partition a of input 
space X on subsets of positive measure, 

(v) T£ vg (X) = sup n&u] T^ vg (Ci< n Xf). 

Proof. By the dehnition, T£ vg (X) = Eie* Exev« T(x) x fi(x)= 

= lim^oo Ei< n Exex? T ( x ) x fi(x) = lim MCO E* eUi < B x“ T ( x ) x h( x ) = 

= lim n _ J . 00 T£ vg (Ui< n X!j *) = ( since T(x) x p(x) > 0) sup neu] T£ vg (Ui< n X?). □ 

Property 2.2 Let p be a normed measure on input space X, let a be a 
countable partition of X on subsets of positive measure, let / : x —* oj be a 
measure of the size of input, and let F : u —> u + . In such circumstances 

(vi) Tj>„ € O(F) 


iff for each distribution v of probability satisfying 

(vii) u(x) = c H x x l*(x), 

where H : u —> u, the following inequality holds: 

(viii) T: vg (X) < (F o fXyX). 


Proof. Let H : uj —> u. We have: 


T" avg {X) < (F O fy avg (X) ee E xeX T( X ) x u(x) < E x£ x Hf( x )) x u(x) = 

= J2 x£ x T(x) xcflX x h( x ) < 'Zxex F(f(x)) x c H x x p(x) = 

= EnGu, Ea;GV“ X Cf, X F{f(x)) X M®) ^ SnGX C H X H(a(x)) X /z(x) = 

(ix) = EnGw X EzgX£ X h( X ) < EnGo; X /i(W“). 
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If (vi) is true then by (iii) and (ix), we get (viii). 

For proof of the converse implication let us assume (viii) and take as H 
in (vii) the characteristic function of the set {m}, where m G c o. In this case 
(ix) may be reduced to 

Exex^ y^j) x ME < M A m)> which gives (vi). □ 

Let us note here that constant Ch in (vii) is unambiguously determined 
by H, since v(X) = 1. Moreover, if f = a then F(f(x)) in (vii) may be 
omitted. 

Property 2.3 Let a be a countable partition of input space X, let / : x —> 10 
be a measure of the size of input, let p be a measure norrned on each X" 
(i.e. p(X%) = 1 for all n6u), and let F, H : uj —> oj. If for each n G uj 

(x) T'il a 6 0(F) 

then for every distribution v of probability satisfying 

(xi) v{x) < x p(x) 

the following implication holds : 

(xii) E„ G w H{n) < oo D T" vg (X) < oo. 

Proof. (x) means that for each n G u: 

( xiii ) E* e x« x ME < 1. 

Hence T^ vg (X) = (by (i) and v(X) = 1) Ea-ex ME x v(x) < 

= Zn&tZxex* T(x) x ^y ) )= E n&mn) x E, e x« x ME) < 

(by xiii) 


< Eneo; H(n), that is to say, T" vg (x) < H{n), which gives us (xii). □ 
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The above properties are useful in estimating tractability of algorithms. 
Property 2.1 gives us a tool for direct calculations of T£ vg (X). Using it one 
may also investigate the rate of growth of T£ in function of U i<n X?, which 
may be useful if T^ vg (X) is infinite, or finite but prohibitively large. Putting 
01 = f one can use known facts about average running time in classic sense in 
estimating the tractability. However, it may be somewhat difficult to discover 
a useful formula describing T£ (Ui <n Xf). Property 2.2 allows estimations of 
tractability in all cases the behavior of F of is known. Property 2.3 (being as 
a matter of fact a generalization of Property 2.1) may prove suitable in cases 
Property 2.1 is not. It allows local analysis (i.e. in X" subspaces) which 
using this property may be extended to the whole input space. 

3 Complexity of tabulating program 


As the first example of application of the introduced notions, let us eval¬ 
uate the complexity of a program, which given a sentence of propositional 
calculus tabulates the Boolean function defined by that sentence. The prob¬ 
lem of such tabulation is NP-hard. 

Even relatively simple algorithms (as one rewriting input to output) may 
be intractable if the distribution of probability does not decrease fast enough 
with the growth of input size. Therefore to have a tractable instance of the 
problem one has to impose some conditions on rate of fading of probability 
distribution. Surprisingly, a relatively modest condition will suffice for this 
end. 

We will start from input space X containing binary representations (using 
e.g. ASCII or EBCDIC codes) of all propositional sentences in the reverse 
Polish form, which are composed of some countably infinite set of proposi¬ 
tional variables, and any complete set of logical connectives (e.g. V, A, and 
-<). As input size measure / we will adopt the length (in bits) of the rep¬ 
resentation mentioned above. As the orthogonal partition a we will use the 
number a(x) of propositional variables appearing in the input x (a and / 
are not fully independent, since f(x) cannot be less than a(x); we will not 
use this fact, however). We will assume, that the running time of the pro¬ 
gram for any input x is equal to 2 “C) x f(x), measured in some abstract 
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units of time. It is quite obvious, that there exists an algorithm returning 
this “efficiency”: if it runs too fast, it delays in printing the answer until 
the time 2“( x ) x f(x) will have been exhausted. Of course, one can probably 
construct a faster program, but this one will suffice for our purposes. It is 
perhaps paradoxical, nevertheless clear, that only the tiny inputs are causing 
problems with relative efficiency of our algorithm, since for large inputs x of 
size greater than 2 a O) it has quite good, linear performance. On the other 
hand, the number of such tiny inputs is relatively so small in comparison to 
the number of all non-equivalent propositions of minimal lengths that it may 
be unable to lead us away from polynomial average hierarchy. 

We will split each X" (the set of all propositions of X with n propositional 
variables) onto a family of its subsets Y 0 n , Y”, ..., Y ™,..., so that Y 0 n will consist 
of some sort of shortest sentences of X", Y/ 1 of the same sort of sentences of 
X" \ Y 0 , and so on. Namely, we define a function min : V(X) —> V(X) by: 

(i) for every element of Y C X there exists a logically equivalent to it 
element of min(Y ) 

(ii) for every element x of min(Y) and every element y of Y, if x is 
logically equivalent to y then f(x) < f(y) 

(iii) min(Y ) is a minimal set satisfying (i) and (ii). 

To demonstrate the existence of such min(Y) one has to make use of the 
axiom of choice: from each class of abstraction for the logical equivalence 
on Y pick up an element x with possibly smallest value of f(x). The set 
constructed this way happened to automatically satisfy condition (iii). 

Now for each n G u pose Y 0 n = min(X%), and Y i ” 1 = mm(X“ \ U j<iYp). 
Of course, we have 

( iv ) x n = U 

and for any i ^ j, Y] n 0 YJ 1 = 0. Let us estimate lower bounds for lengths of 
codes of elements in Y/h Each Y ] n contains the number of elements equal to 
the cardinality of Lindenbaum’s algebra with n generators, or - equivalently 
- of Boolean algebra of functions with n variables, that is to say, 2 2 ". Let us 
assume, that probability distribution y assigns the same value to all elements 
of Y/\ A semantical argument of 1-1 correspondence between the elements 
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of Y™ and elements of mentioned above algebras shows, that this assumption 
is reasonable. It will enable us to apply Shannon’s counting argument. 

To evaluate the value of h( x ) ■> equal to Yj X &yp jzjyj x p( x ), let 

us observe that for every function g : u —>■ u such that for all x, g(x) < f(x), 
the inequality J2xeY n jryj x p(x) < J2 x eY n x h( x ) holds. Therefore 
we may safely assume that each Y™ contains all 2 2 " shortest binary codes, 
giving the absolute lower bound of / for all Yf' together. In this case Y™ is 
composed of all the codes of length < 2 n — 1, and of one code of length 2 n . 

We have: 

jzfyj x hi x ) — ^xeYf x h( x ) — 2" x p{yo) x J2 x eYp jyyj — 

= 2 n x y(y 0 ) x (E^r 1 £ x 2* + ) < 2 - x y(y 0 ) x £-=i ^x2’< 2 n x 

Kyo) X 2” x x 2 2 " = 

= p(y 0 ) x 2 2 " = yu(E n ), where y 0 is any element of Y™. From (iv) follows 

Exex% x n{x) = Eieu E xgy™ x y(x) < E; G a, y(Yf) = /x(U. i^Y?) = 

/«)• 

According to our definition of O-class, it means that G 0(« 3 ). 

Applying Property 2.3 and taking into account a(x) < f(x) we conclude 
that if for every x, 

v(x) < c x / d (3! ff(,Yg) » where d > 4, then T^ vg (X) < oo. 

We were not able to draw this conclusion using exclusively Property 
2.1, which suggests that our generalized notion of average running time O- 
hicrarchy may be more useful that the classic one. 

4 Complexity of the satisfiability problem 


The satisfiability problem of propositional calculus may be formulated as 
follows. 
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Given a sentence ip of propositional calculus, decide whether there 
exists a truth-valued assignment for its proportional variables 
making ip true. 

All known deterministic solutions to the satisfiability problem are of ex¬ 
ponential worst-case time complexity. However, the question of existence 
of polynomial solution still remains open. If the answer is “yes” then, as 
Cook has shown, every problem, which may be non-deterministically solved 
in polynomial worst-case time, can also be solved deterministically in poly¬ 
nomial worst-case time. This is the celebrated P=NP problem. 

Instead of investigating the worst-case running time of the quickest solu¬ 
tion of the satisfiability problem, we will answer more practical question of 
its tractability, instead. A positive result we have been able to achieve in this 
respect makes, in our opinion, the P=NP problem slightly less dramatical. 

One may expect, that testing the satisfiability should be easier than tab¬ 
ulating a Boolean function. Indeed, for all but unsatishable sentences (de¬ 
scribing the constant false Boolean function) one may stop trying all possible 
assignments after the first satisfying one has been found. Now our program 
will stop either if it found an assignment making its input sentence true or 
if it examined unsuccessfully all possible assignments. 

How much time will it save us on average? We will show that surpris¬ 
ingly much, as it follows from an elementary property of subsets of the set 
M = {0,..., M — 1}: assuming fair distribution of probability on V(M), the 
expected value of minimal element in a random subset of M (which is the 
same as the expected number of tosses of a coin until heads appears) is less 
than 2, no matter how large is M. Qualitatively similar observation one can 
find in [Wil84], pages 216-221, where the author proves that the average 
number N of nodes in the backtrack search tree of a random graph subjected 
to coloring with at most n colors may be approximated regardless of the size 
of the graph; e.g. if n — 3 then N cs 197. 

With each proposition ip n of n proportional variables we will associate its 
model: a set fC{ip n ) of all assignments, coded as binary sequences of length 
n, which make tp n true. Since every such sequence constitutes a number 
from the interval (0, 2 n — 1), models may be thus understood as subsets 
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of 2 n = {0,...,2 n — 1}. We assume that the program testing satisfiability 
scans all numbers m from 0 to 2 n ~ 1 , verifying for each m, whether its binary 
representation satisfies a sentence in question or not. 

The time (measured in some abstract units) our program will spent on 
any input x with n propositional variables is given by: 

T{x ) = f(x) x (min n (lC(x)) + 1) 


where 


min n (lC) 


2 n iff 1C = 0 
min(JC ) otherwise 


Having a model /C one may think of the set of all propositions (f n , for 
which /C is the model. Let us denote it by Th(lC). Using similar semantical 
argument as in section 3, we assume that given n, it is equally likely that a 
random formula tp falls in any class ThfJC). In terms of probability distribu¬ 
tion ft it means that for each n and every two /C, C C 2 n , [J,(X% D Th[K)) = 
IJ,(X£r)Th(C)). 

We have: 


^ x h{ x ) — X)/cc2 n S. 


2 xf(x) 

— S/CC 2 " Z)xeX“n77i(/C) 2 


xexgnTh{K) 2T^y x d( x ) ~ 

2 J2xex%nTh(IC) h{ x ) 


min n (K)+l ^ fl{x) — P K. 


= E 


KC.2' 


min(IC)+l 

2 


x n Th{K)) — YfjK.c.2' 


rmn(JC)+l x _ 


2 2 " 


uPfSl 

2 


Y- min n (K .)+1 _ 

A 2^/CC2 n 2 2 " — 2 A 2_,j=i 


ix2 2 


2 2 " 


< 


2 




Hence e 0(2 x •). 

Applying Property 2.3 we conclude that if for every x, 
v(x) < c x , where d > 2, then T” vg (X) < oo. Again we were 

not lucky enough to get the same result using classic complexity measures. 
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The same calculations prove the above for the co-problem. Also, the 
NP-completeness of the satisfiability problem seems to be a rich source of 
similar estimations for other known complex problems. E.g. the mentioned 
above graph coloring with backtrack search, or simplex algorithm (see [Wil84] 
for its analysis) have been known to have better than exponential average 
performance. 

5 Higher order moments 


Similar calculations show that the m-th moment of T(x), that is to say, 
the average m —th power of the running time of the program mentioned in 
section 4 is in 

0(2.5 x m m+1 x • m ). Namely, for m > 2 we have: 


'E'xeXg 2.5xm m + 1 x/ m (ir) X x ) E/CC2" EateX^n Th(K) 2.5 xrn m + 1 xf m (x) X M :r ) 

_ (mm„(/C)+l) m / \ _ (min n (/C)+1 ) m r 

— 2^/CC2™ l-,x£X%nTh{K.) 2.5xm m + 1 x r\ x ) ~ 2^K.Q2 n 2.5xm m + 1 2-'x£X%nTh(K) 


T m (x ) 


= Ea 


(■ min(K)+l) r ‘ 


j)C<Z2 n 2.5xm m 

_ BiASl 


2.5 x?n m +! 
^-d=l 2 i ' 


X E 


/CC2" 


X n Th(JC)) — E/CC2™ 2.5xm m + 1 

(mm„(AC)+l) _ 


(min(K)+1)" x MAg) - 




-2^- 


w v^2 n i m x2 2 ^ 

1 x 2_/i=l 2 571 — 2. 


2.5xm m + 


5xm m+1 


X 


On the other hand, C = E£i S+E£ m+ , 'w < EEi C+ES^+iti) 


2 TTl 


= m m x E'=i 2" i + Efc 2 ±i Af= ^(^) m < m m + rn m x E”^ Af=i (§) m < 

’ m ’ ■ m ’ m ’ ’ m 

< rn m x(l+rnxE“ = i(|) m < m m x(l+mxE” = i £ < fxrn m+1 +2xrn m+1 < 


2.5 x m m+1 . 
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Hence 2.5 X ^ Le ‘ ( Tm )&,a e 0(2.5 x 

m m+ 1 x • m ). 

There is a surprising (please take into account approximate calculations) 
coincidence between the constant 197 for 3-coloring backtrack search of 
[Wil84], page 216, and the constant 3 3 + 2 x 3 3+1 = 189 of our estimation. 

6 A grain of salt 


As we have seen in two previous sections, under rather acceptable assump¬ 
tions we calculated that the expected running time of tabulating algorithm 
does not exceed the cube of the time needed for merely rewriting the input, 
and that the expected running time of satisfiability testing is less than three 
times greater than the time spent on reading the input. Those result may or 
may not hold for other probability distributions. Despite its seemingly nat¬ 
uralness, the assumption of section 3 we have made about piY- 1 ) is rather 
strong; as a matter of fact, it implies that the probability of a sentence 
decreases exponentially with the number of distinct variables it contains. 
(Here Shannon’s counting argument fights back). In our opinion it cannot 
be precluded that it is the most likely probability distribution in Artificial 
Intelligence applications, where verified sentences are rather far from being 
random in a lexical sense. However, if we assume, that the probability /i de¬ 
creases with p-th power of input’s length then the following example shows 
th at T£ vg {X) = 00. 

Example 6.1 Consider a language containing all and only 16 binary con¬ 
nectives (i.e. names of binary Boolean functions). Elementary calculations 
show that there are 

T(N) x 16^ x (2 JV+1 - 1) 

different sentences containing exactly N connectives (and therefore IV + 1 
propositional variables; the set V of this variables we treat as fixed here), 
where T(iV) is defined inductively: 


r (0) = 1, 
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r(n +1) = Er= 0 r (i) x r(n - i), 


and denotes the number of different types of sentences one may construct out 
of N binary connectives. Factor 2 JV+1 — 1 is the number of possible selections 
from V. 

Total time of reading all these sentences is equal to ( 2N + 1) x T(N) x 
16*(2 JV+1 — 1), while total time of their tabulating is (2 N + 1) x T(N) x 
16* x (3* +1 - 1). Therefore the ratio F(N) = x ( 2N + 1)“P « 

(1.5)* +1 x (2N+1)~ P cannot have the convergent sum, i.e. J2n=o F(N) = oo. 

The same is true if we assume, that input’s probability decreases with 
p-tli power of the number of its propositional variables. □ 

The situation becomes diametrically different if one assumes to have in the 
language all possible n-ary connectives for each n < 0, with fair distribution 
of probability over arity classes. This means that each n-ary Boolean function 
has in this language its individual name which may appear in input equally 
likely with any other name of n-ary Boolean function. The explosion of 
connectives and lengths of their codes should substantially contribute to the 
enhancement of average relative running time of tabulating program: one 
may easily verify than assumption that fi is constant on Y™ is satisfied in 
this case. 

The situation with the satisfiability problem is, hopefully, not as clear, 
because we did not use Shannon’s counting argument here. Of course, having 
all possible and equally likely connectives in a language forces that the as¬ 
sumption of n(X%nTh(lC)) = p(X%nTh(C)) is met. The more problematic 
case, where, say, the arity of connectives is bounded, e.g. it cannot exceed 
2, requires further investigation. The answer to this problem is, probably, 
hidden in the following question: 

Assuming that all and only A^-ary connectives are present in the 
object language, and that any two sentences of the same length 
have the same probability, given number M, what is the expected 
value of min a ( x -)(lC(x )), where x is a random element of X M 1 
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7 A comparison of methods 


In our opinion, the expected complexity T£ vg (X), and in particular its 
finiteness, is the most adequate complexity measure, provided P is intended 
for frequent future use, and the distribution of probability // really describes 
what is going on in its input. The role of other characteristics, like T/, Tfj g , 
or , as well as asymptotic measures of complexity, is secondary, as they 
serve as a calculational facility in estimating the value of T£ (X). Inciden¬ 
tally, the knowledge of worst-case or average running time in the classic sense, 
or at least some O-class to which it belongs, may be sufficient to prove that 
Ta Vg (X) < oo, using e.g. Property 2 . 1 , but, as we have seen, not necessarily 
in all cases. On the other hand, a peculiar conviction that 0(«) is much 
better than 0 ( 2 *) in circumstances when the probability that in the next 
run the input will have given length decreases with its second power, seems 
like preferring rain to mud: both of them cause nontractability problems. 

If one insists on having a characterization of how an increase in size of 
input space would affect the tractability of an algorithm, Property 2.1 is a 
neat tool for the purpose. It may be useful, e.g,. for finding a maximal N 
such that T£ ((J^nX?) < c, where c is a limit of one’s average patience. 
Since, on general, values of T£ vg {Ui< N X() and T£ vg {X f N ) may differ from 
each other considerably, using to this end the classical concept of average 
running time, besides some unnecessary calculational problems which result 
from restricting a to /, may lead to false conclusions. Obviously, asymptotic 
measures may be impractical in such a case, since N we are interested in may 
be not large enough, i.e. less than no appearing in the definition of 0 -class. 

Asymptotic measures may be adequate iff the probability of inputs of 
some small size is appropriately small, which would probably happen in most 
cases where probabilities of any two inputs, or at least of any two input’s 
lengths, were the same. However, if the input space is infinite, then such 
distribution of probability is impossible, since in this case 


x ) = 


xex 


Exev 0 = 0 ^ 1 , if p(x) = 0 
J2xgx £ = oo 7 ^ 1 , otherwise. 
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In our opinion the above fact is one of the reasons for discrepancies be¬ 
tween asymptotic and actual behaviors of many algorithms. 

Using a worst-case measure in estimating algorithm efficiency is equiva¬ 
lent to average case if the probability of non-worst inputs vanishes. This is 
true under, as we call it, the malicious gnome assumption. 


8 Final remarks 


Many people are quite skeptical about adequacy of probability theory, 
seemingly expecting somebody to demonstrate the “truthfulness” of its ax¬ 
ioms. We do not share their reservations, consciously leaving the choice of 
pertinent probability measure to lucky guessing of the applier. It does not 
mean, however, that we see the results obtained on the ground of this the¬ 
ory as nothing but speculations. In particular, we have found it a little bit 
surprising, nevertheless instructive, that under quite realistic assumptions 
a simple reading program may need, on average, as much as 30 % of the 
running time of a satisfiability checker. This is why we wrote this paper. 
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